Introduction

Spatial interpolation is the process of estimating values at unsampled locations based on measurements from sampled locations. This is crucial in environmental sciences where we often have point measurements (weather stations, soil samples) but need continuous surfaces for analysis.

What You’ll Learn

In this tutorial, you will:

  • Load and visualize point data (weather stations)
  • Understand different interpolation methods
  • Implement Inverse Distance Weighting (IDW)
  • Apply Thin Plate Splines (TPS)
  • Perform Kriging (geostatistical interpolation)
  • Compare interpolation methods using cross-validation
  • Create ensemble predictions

The Dataset

We’ll use weather station data from Denmark (DMI - Danish Meteorological Institute), focusing on Mean Annual Temperature (MAT). The data includes:

  • Station locations (latitude, longitude, elevation)
  • Climate measurements (temperature, precipitation)
  • 200+ weather stations across Denmark

Essential Functions Reference

Task Function Package Key Arguments
Read CSV read.csv() base url/file
Create sf st_as_sf() sf data, coords, crs
Transform CRS st_transform() sf object, crs
Create raster rast() terra object, res
Mask raster mask() terra x, mask
IDW idw() gstat formula, locations, newdata, nmax, idp
TPS Tps() fields x, Y
Variogram variogram() gstat object, cutoff
Fit variogram autofitVariogram() automap formula, input_data, model
Kriging krige() gstat formula, locations, newdata, model
Auto kriging autoKrige() automap formula, input_data, new_data
Cross-validation krige.cv() gstat formula, locations, nfold, model
K-fold kfold() dismo x, k

Common Issues and Solutions

Geometry Errors: If you encounter errors like “Loop is not valid” or “duplicate vertex”, use st_make_valid() to fix invalid geometries in spatial objects before operations like cropping or intersecting.

# Fix invalid geometries
my_spatial_object <- st_make_valid(my_spatial_object)

Part 1: Loading and Exploring Point Data

1.1 Load Required Packages

# Spatial data handling
library(sf)     # Simple features for vector data
library(terra)     # Raster/vector operations
library(gstat)     # Spatial statistics and Kriging

# Data manipulation and visualization
library(dplyr)     # Data wrangling
library(ggplot2)     # Visualization
library(tidyterra)     # Visualizing rasters with ggplot2

# Mapping
library(rnaturalearth)     # Natural Earth map data

# Specialized interpolation
library(fields)     # Thin plate splines
library(dismo)     # Cross-validation
library(sp)     # Spatial classes (older format)
library(automap)     # Automatic variogram fitting
library(raster)     # Convert between raster formats

HINT: Packages are: sf, terra, gstat, dplyr, ggplot2, tidyterra, rnaturalearth, fields, dismo, sp, automap, raster

1.2 Load Weather Station Data

# Load weather station data from online source
DMI.Stations <- read.csv("https://raw.githubusercontent.com/AlejoOrdonez/StaGeoMod2021/main/StationsWithClim.csv")

# View variable names
colnames(DMI.Stations)
##  [1] "X"                      "StationId"              "Name"                  
##  [4] "Country"                "Owner"                  "StationType"           
##  [7] "Status"                 "StationHeightabvMSL.m." "BarometerHeight..m."   
## [10] "Latitude"               "Longitude"              "Region"                
## [13] "Operation.From"         "Country.Code"           "StationId.1"           
## [16] "MAT.C"                  "AnnPrec.mm.yr"
# Check dimensions
dim(DMI.Stations)
## [1] 211  17
# View first few rows
head(DMI.Stations)
##   X StationId            Name Country Owner StationType Status
## 1 1      6141            Abed     DNK   DMI       Synop Active
## 2 2      6079     Anholt Havn     DNK   DMI       Synop Active
## 3 3      6123     Assens/Torø     DNK   DMI       Synop Active
## 4 4      6081 Blåvandshuk Fyr     DNK   DMI       Synop Active
## 5 5      6082          Borris     DNK   DMI       Synop Active
## 6 6      6154       Brandelev     DNK   DMI       Synop Active
##   StationHeightabvMSL.m. BarometerHeight..m. Latitude Longitude Region
## 1                   7.00                9.12  54.8275   11.3292      6
## 2                   2.36                3.55  56.7169   11.5098      6
## 3                   1.65                3.39  55.2444    9.8882      6
## 4                  13.00               17.62  55.5575    8.0828      6
## 5                  25.00                  NA  55.9591    8.6242      6
## 6                  44.52               46.11  55.2075   11.8605      6
##   Operation.From Country.Code StationId.1 MAT.C AnnPrec.mm.yr
## 1     31/08/1999         6080        6141   8.8           582
## 2     20/05/1993         6080        6079   8.5           570
## 3     08/05/2001         6080        6123   8.8           680
## 4     18/09/1980         6080        6081   9.1           757
## 5     17/04/2002         6080        6082   8.6           899
## 6     21/01/2004         6080        6154   8.5           648

TASK: Fill in functions to read CSV from URL, view names, dimensions, and first rows.

1.3 Convert to Spatial Object

# Convert data frame to sf POINT object
DMI.Stations.Shp <- st_as_sf(x = DMI.Stations,
                          coords= c("Longitude", "Latitude"),
                          crs = 4326)  # WGS84 coordinate system

# Check the object
class(DMI.Stations.Shp)
## [1] "sf"         "data.frame"

TASK: Fill in the function to create sf objects, its arguments (data, coords, crs), and coordinate column names.

1.4 Visualize Station Locations

# Load world map
wrld_simpl <- ne_countries(type = "countries", scale = "large")

# Fix invalid geometries (important!)
wrld_simpl <- st_make_valid(wrld_simpl)

# Plot all stations with world map
ggplot() +
  geom_sf(data = wrld_simpl, fill = "gray80", color = "white") +
  geom_sf(data = DMI.Stations.Shp, color = "blue", size = 2) +
  theme_minimal() +
  labs(title = "DMI Weather Stations",
       x = "Longitude", y = "Latitude")

TASK: Fill in function to get country data and its arguments, function to fix geometries, plus geoms for sf polygons and points.

1.5 Extract Denmark Stations Only

# Filter for Denmark only (ISO code: DNK)
DK.Sites <- which(DMI.Stations.Shp$Country == "DNK")

# Subset to Denmark stations
DMI.DK.Shp <- DMI.Stations.Shp[DK.Sites, ]

# View summary
summary(DMI.DK.Shp)
##        X           StationId         Name             Country         
##  Min.   :  1.0   Min.   : 5005   Length:185         Length:185        
##  1st Qu.: 47.0   1st Qu.: 5406   Class :character   Class :character  
##  Median :116.0   Median : 6126   Mode  :character   Mode  :character  
##  Mean   :111.3   Mean   :13035                                        
##  3rd Qu.:165.0   3rd Qu.:22189                                        
##  Max.   :211.0   Max.   :32110                                        
##                                                                       
##     Owner           StationType           Status         
##  Length:185         Length:185         Length:185        
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##                                                          
##                                                          
##                                                          
##                                                          
##  StationHeightabvMSL.m. BarometerHeight..m.     Region   Operation.From    
##  Min.   : 1.50          Min.   : 2.700      Min.   :6    Length:185        
##  1st Qu.: 7.00          1st Qu.: 6.138      1st Qu.:6    Class :character  
##  Median :18.00          Median :16.500      Median :6    Mode  :character  
##  Mean   :26.15          Mean   :24.757      Mean   :6                      
##  3rd Qu.:38.50          3rd Qu.:41.820      3rd Qu.:6                      
##  Max.   :96.00          Max.   :89.200      Max.   :6                      
##  NA's   :70             NA's   :147         NA's   :70                     
##   Country.Code   StationId.1       MAT.C       AnnPrec.mm.yr  
##  Min.   :6080   Min.   :6019   Min.   :7.900   Min.   :523.0  
##  1st Qu.:6080   1st Qu.:6074   1st Qu.:8.400   1st Qu.:602.8  
##  Median :6080   Median :6124   Median :8.500   Median :671.0  
##  Mean   :6080   Mean   :6118   Mean   :8.494   Mean   :691.7  
##  3rd Qu.:6080   3rd Qu.:6158   3rd Qu.:8.700   3rd Qu.:776.8  
##  Max.   :6080   Max.   :6197   Max.   :9.100   Max.   :921.0  
##  NA's   :138    NA's   :138    NA's   :1       NA's   :1      
##           geometry  
##  POINT        :185  
##  epsg:4326    :  0  
##  +proj=long...:  0  
##                     
##                     
##                     
## 

TASK: Fill in the country column name, ISO code for Denmark, the subsetting index, and summary function.

# Plot Denmark stations with zoom
ggplot() +
  geom_sf(data = wrld_simpl, fill = "gray80", color = "white") +
  geom_sf(data = DMI.DK.Shp, color = "blue", size = 2) +
  xlim(7.9, 15.1) +
  ylim(54.5, 58.0) +
  theme_minimal() +
  labs(title = "DMI Stations in Denmark")

TASK: Fill in the function to set coordinate limits in ggplot.


Part 2: Exploring Climate Data

2.1 Visualize Temperature Distribution

# Order stations by temperature
MAT.OrderVct <- DMI.DK.Shp[order(DMI.DK.Shp$MAT.C), ]

# Create scatter plot
ggplot(MAT.OrderVct, aes(x = X, y = MAT.C)) +
  geom_point() +
  labs(x = "Stations",
       y = "Mean Annual Temperature (°C)",
       title = "Mean Annual Temperature Distribution") +
  theme_minimal()

TASK: Fill in the function to order data, the temperature column, and geom for points.

2.2 Spatial Map of Temperature

# Fix invalid geometries in world map
wrld_simpl <- st_make_valid(wrld_simpl)

# Create Denmark bounding box
denmark_bbox <- st_bbox( c(xmin = 7.9, xmax = 15.1, 
                        ymin = 54.5, ymax = 58.0), 
                      crs = st_crs(4326))

# Crop world map to Denmark
wrld_simpl_denmark <- st_crop(wrld_simpl, denmark_bbox)

# Plot temperature spatially
ggplot() +
  geom_sf(data = wrld_simpl_denmark, fill = "lightgray", color = NA) +
  geom_sf(data = DMI.DK.Shp, aes(color = MAT.C), size = 2) +
  scale_color_gradientn(colors = hcl.colors(100, palette = "Blue-Red"),
                       name = "Temperature (°C)") +
  theme_minimal() +
  labs(title = "Mean Annual Temperature in Denmark")

TASK: Fill in functions to fix invalid geometries, create bounding box, crop spatial data, and the aesthetic for color.

HINT: Use st_make_valid() to fix geometry issues before cropping.


Part 3: Coordinate Transformation

3.1 Transform to Projected CRS

For accurate distance calculations, transform to a projected coordinate system (ETRS89 Lambert Azimuthal Equal-Area).

# Transform stations to projected CRS
DMI.DK.Shp_proj <- st_transform(x = DMI.DK.Shp,
                         crs = 3035)  # EPSG code for ETRS89 LAEA

# Transform Denmark map
wrld_simpl_denmark_proj <- st_transform(wrld_simpl_denmark, 3035)

# Check the new CRS
st_crs(DMI.DK.Shp_proj)
## Coordinate Reference System:
##   User input: EPSG:3035 
##   wkt:
## PROJCRS["ETRS89-extended / LAEA Europe",
##     BASEGEOGCRS["ETRS89",
##         DATUM["European Terrestrial Reference System 1989",
##             ELLIPSOID["GRS 1980",6378137,298.257222101,
##                 LENGTHUNIT["metre",1]]],
##         PRIMEM["Greenwich",0,
##             ANGLEUNIT["degree",0.0174532925199433]],
##         ID["EPSG",4258]],
##     CONVERSION["Europe Equal Area 2001",
##         METHOD["Lambert Azimuthal Equal Area",
##             ID["EPSG",9820]],
##         PARAMETER["Latitude of natural origin",52,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8801]],
##         PARAMETER["Longitude of natural origin",10,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8802]],
##         PARAMETER["False easting",4321000,
##             LENGTHUNIT["metre",1],
##             ID["EPSG",8806]],
##         PARAMETER["False northing",3210000,
##             LENGTHUNIT["metre",1],
##             ID["EPSG",8807]]],
##     CS[Cartesian,2],
##         AXIS["northing (Y)",north,
##             ORDER[1],
##             LENGTHUNIT["metre",1]],
##         AXIS["easting (X)",east,
##             ORDER[2],
##             LENGTHUNIT["metre",1]],
##     USAGE[
##         SCOPE["unknown"],
##         AREA["Europe - LCC & LAEA"],
##         BBOX[24.6,-35.58,84.17,44.83]],
##     ID["EPSG",3035]]

TASK: Fill in the transformation function and its arguments (object, crs), and function to check CRS.

3.2 Plot with Projected Coordinates

# Plot with projected coordinates
ggplot() +
  geom_sf(data = wrld_simpl_denmark_proj, fill = "lightgrey", color = NA) +
  geom_sf(data = DMI.DK.Shp_proj, aes(color = MAT.C), size = 2) +
  scale_color_gradientn(colors = hcl.colors(100, palette = "Blue-Red"),
                       name = "Temperature (°C)") +
  theme_minimal() +
  labs(title = "Mean Annual Temperature (Projected)")


Part 4: Null Model Baseline

4.1 Create RMSE Function

# Function to calculate Root Mean Square Error
RMSE.Fnc <- function(observed, predicted) {
  sqrt(mean((observed - predicted)^2, na.rm = TRUE))
}

TASK: Fill in the functions: outer function (square root), inner function (mean), and subtraction components.

4.2 Calculate Null Model

The null model assigns the mean temperature to all locations.

# Assign mean as prediction (null model)
DMI.DK.Shp_proj$Null.MAT <- mean(DMI.DK.Shp_proj$MAT.C, na.rm = TRUE)

# Calculate RMSE for null model
null.RMSE.MAT <- RMSE.Fnc(observed = DMI.DK.Shp_proj$MAT.C,
                       predicted = DMI.DK.Shp_proj$Null.MAT)

# Print null model RMSE
print(paste("Null Model RMSE:", round(null.RMSE.MAT, 3)))
## [1] "Null Model RMSE: 0.268"

TASK: Fill in the mean function, RMSE function, and its arguments (observed, predicted).


Part 5: Inverse Distance Weighting (IDW)

5.1 Understanding IDW

Inverse Distance Weighting assumes nearby points are more similar than distant points. It weights observations by the inverse of their distance to the prediction location.

Key parameters: - nmax: Number of nearest neighbors to use - idp: Inverse distance power (0 = simple average, 2 = default weighting)

5.2 Create Prediction Grid

# Create raster template
DK.SpaGrid.1km <- rast(wrld_simpl_denmark_proj, res = 1000)  # 1km resolution

# Initialize with values
values(DK.SpaGrid.1km) <- 1

# Mask ocean areas
DK.SpaGrid.1km <- mask(x = DK.SpaGrid.1km,
                        mask = wrld_simpl_denmark_proj)

# Convert to SpatialGridDataFrame for gstat
DK.SpaGrid.1km <- as(raster::raster(DK.SpaGrid.1km), 
                     'SpatialGridDataFrame')

TASK: Fill in functions to create raster, set values, mask, and the class name for spatial grid.

5.3 Perform IDW Interpolation

# Remove NA values
DMI.DK.Shp_proj_noNA <- DMI.DK.Shp_proj[!is.na(DMI.DK.Shp_proj$MAT.C), ]

# Perform IDW
library(gstat)
idw.gstat <- idw(formula = MAT.C ~ 1,
                   locations = as(DMI.DK.Shp_proj_noNA, "Spatial"),
                   newdata = DK.SpaGrid.1km,
                   nmax = 10,    # Use 10 nearest neighbors
                   idp = 0)     # Equal weighting (no distance decay)
## [inverse distance weighted interpolation]
# Convert to SpatRaster
IDW.MAT.1km <- rast(idw.gstat)

# Plot result
library(tidyterra)
ggplot() +
  geom_spatraster(data = IDW.MAT.1km[[1]]) +
  scale_fill_viridis_c(name = "Temperature (°C)", option = "B") +
  geom_sf(data = st_as_sf(wrld_simpl_denmark_proj), 
          fill = NA, color = "black") +
  theme_minimal() +
  labs(title = "IDW Interpolation - Mean Annual Temperature")

TASK: Fill in the temperature column, IDW function, its arguments (formula, locations, newdata, nmax, idp), rast conversion function, and geom for rasters.

5.4 Cross-Validation for IDW

# Set seed for reproducibility
set.seed(5132015)

# Create 5-fold cross-validation groups
library(dismo)
kf <- kfold(x = DMI.DK.Shp_proj, k = 5)

# Vector to store RMSE values
RMSE.IDW.MAT <- rep(NA, 5)

# Loop through folds
for (k in 1:5) {
  # Create test and train sets
  test <- DMI.DK.Shp_proj[kf == k, ]
  train <- DMI.DK.Shp_proj[kf != k, ]
  
  # Build gstat model
  gs <- gstat(formula = MAT.C ~ 1,
             data = train[!is.na(train$MAT.C), ],
             nmax = 10,
             set = list(idp = 0))
  
  # Predict for test locations
  p <- predict(gs, test)
  
  # Calculate RMSE
  RMSE.IDW.MAT[k] <- RMSE.Fnc(test$MAT.C, p$var1.pred)
}
## [inverse distance weighted interpolation]
## [inverse distance weighted interpolation]
## [inverse distance weighted interpolation]
## [inverse distance weighted interpolation]
## [inverse distance weighted interpolation]
# Print results
print(RMSE.IDW.MAT)
## [1] 0.18781100 0.14695099 0.11099299 0.10186106 0.09179884
print(paste("Mean RMSE:", round(mean(RMSE.IDW.MAT), 3)))
## [1] "Mean RMSE: 0.128"
# Calculate improvement over null model
RMSE.IDW.MAT.Imp <- 1 - (mean(RMSE.IDW.MAT) / null.RMSE.MAT)
print(paste("Improvement over null:", round(RMSE.IDW.MAT.Imp * 100, 1), "%"))
## [1] "Improvement over null: 52.3 %"

TASK: Fill in the kfold function and its arguments, fold indices, gstat function and arguments, predict function, and prediction column name.


Part 6: Thin Plate Splines (TPS)

6.1 Understanding TPS

Thin Plate Splines fit a smooth surface through data points, like bending a thin metal sheet. They minimize curvature while fitting the data, creating smooth interpolations.

6.2 Fit TPS Model

# Load fields package
library(fields)

# Fit TPS model
TPS.MAT <- Tps(x = st_coordinates(DMI.DK.Shp_proj),  # Coordinates matrix
                Y = DMI.DK.Shp_proj$MAT.C)  # Response variable
## Warning: 
## Grid searches over lambda (nugget and sill variances) with  minima at the endpoints: 
##   (GCV) Generalized Cross-Validation 
##    minimum at  right endpoint  lambda  =  9.608608e-11 (eff. df= 158.65 )
# Generate predictions
TPS.MAT.Pred <- interpolate(object = raster(DK.SpaGrid.1km),
                      model = TPS.MAT)

# Mask ocean
TPS.MAT.Pred <- mask(TPS.MAT.Pred, wrld_simpl_denmark_proj)

# Convert to SpatRaster
TPS.MAT.1km <- rast(TPS.MAT.Pred)

# Plot result
library(tidyterra)
ggplot() +
  geom_spatraster(data = TPS.MAT.1km) +
  scale_fill_viridis_c(name = "Temperature (°C)", option = "B") +
  geom_sf(data = st_as_sf(wrld_simpl_denmark_proj), 
          fill = NA, color = "black") +
  theme_minimal() +
  labs(title = "TPS Interpolation - Mean Annual Temperature")

TASK: Fill in the TPS function, coordinate extraction function, its arguments (x, Y), interpolate function, and its arguments.

6.3 Cross-Validation for TPS

# Set seed
set.seed(5132015)

# Create folds
kf <- kfold(x = DMI.DK.Shp_proj, k = 5)

# Vector for RMSE
RMSE.TPS.MAT <- rep(NA, 5)

# Loop through folds
for (k in 1:5) {
  test <- DMI.DK.Shp_proj[kf == k, ]
  train <- DMI.DK.Shp_proj[kf != k, ]
  
  # Fit TPS model
  Tps.mod <- Tps(x = st_coordinates(train),
                   Y = train$MAT.C)
  
  # Predict
  p <- predict(Tps.mod, st_coordinates(test))
  
  # Calculate RMSE
  RMSE.TPS.MAT[k] <- RMSE.Fnc(test$MAT.C, p)
}
## Warning: 
## Grid searches over lambda (nugget and sill variances) with  minima at the endpoints: 
##   (GCV) Generalized Cross-Validation 
##    minimum at  right endpoint  lambda  =  1.310831e-10 (eff. df= 130.15 )
## Warning: 
## Grid searches over lambda (nugget and sill variances) with  minima at the endpoints: 
##   (GCV) Generalized Cross-Validation 
##    minimum at  right endpoint  lambda  =  1.940061e-10 (eff. df= 129.2 )
## Warning: 
## Grid searches over lambda (nugget and sill variances) with  minima at the endpoints: 
##   (GCV) Generalized Cross-Validation 
##    minimum at  right endpoint  lambda  =  2.210463e-10 (eff. df= 130.1501 )
## Warning: 
## Grid searches over lambda (nugget and sill variances) with  minima at the endpoints: 
##   (GCV) Generalized Cross-Validation 
##    minimum at  right endpoint  lambda  =  1.284304e-10 (eff. df= 127.3 )
## Warning: 
## Grid searches over lambda (nugget and sill variances) with  minima at the endpoints: 
##   (GCV) Generalized Cross-Validation 
##    minimum at  right endpoint  lambda  =  1.52432e-10 (eff. df= 131.1 )
# Print results
print(RMSE.TPS.MAT)
## [1] 0.09009537 0.10918297 0.15433318 0.13928009 0.06007873
print(paste("Mean RMSE:", round(mean(RMSE.TPS.MAT), 3)))
## [1] "Mean RMSE: 0.111"
# Calculate improvement
RMSE.TPS.MAT.Imp <- 1 - (mean(RMSE.TPS.MAT) / null.RMSE.MAT)
print(paste("Improvement over null:", round(RMSE.TPS.MAT.Imp * 100, 1), "%"))
## [1] "Improvement over null: 58.7 %"

TASK: Fill in Tps function, its arguments, predict function, and coordinate extraction for test data.


Part 7: Kriging (Geostatistical Interpolation)

7.1 Understanding Kriging

Kriging uses spatial autocorrelation structure (semivariogram) to optimally weight nearby observations. It provides: - Best linear unbiased predictions - Prediction uncertainty estimates

7.2 Check for Duplicate Locations

# Convert to Spatial object
DMI.DK.Shp_proj_SP <- as(DMI.DK.Shp_proj, "Spatial")

# Check for zero-distance points
zerodist(DMI.DK.Shp_proj_SP)
##       [,1] [,2]
##  [1,]   50  119
##  [2,]   53  123
##  [3,]   56  126
##  [4,]   63  132
##  [5,]   67  134
##  [6,]   71  137
##  [7,]   75  140
##  [8,]   81  148
##  [9,]   90  157
## [10,]   94  161
## [11,]   95  162
## [12,]  100  168
## [13,]  101  169
## [14,]  102  170
## [15,]  104  174
## [16,]  105  177
## [17,]   42  182
# Remove duplicates if any
DMI.DK.Shp_proj2 <- DMI.DK.Shp_proj_SP[-zerodist(DMI.DK.Shp_proj_SP)[, 1], ]

# Check again
zerodist(DMI.DK.Shp_proj2)
##      [,1] [,2]

TASK: Fill in the Spatial class name and the function to check zero distances.

7.3 Calculate Semivariogram

# Create gstat object
MAT.gstat <- gstat(formula = MAT.C ~ 1,
                   data = DMI.DK.Shp_proj_SP[!is.na(DMI.DK.Shp_proj_SP$MAT.C), ])

# Calculate sample variogram
MAT.Variog <- variogram(MAT.gstat, cutoff = 200000)  # 200 km cutoff

# View variogram
head(MAT.Variog)
##    np      dist      gamma dir.hor dir.ver   id
## 1 120  4580.958 0.00437500       0       0 var1
## 2 311 21739.396 0.02249196       0       0 var1
## 3 519 33041.200 0.02368015       0       0 var1
## 4 669 46696.089 0.03146487       0       0 var1
## 5 811 59989.650 0.04115290       0       0 var1
## 6 894 73625.525 0.05159396       0       0 var1
# Plot variogram
plot(MAT.Variog)

TASK: Fill in gstat function and arguments, variogram function, head function, and plot function.

7.4 Fit Theoretical Variogram

# Automatic variogram fitting
library(automap)
MAT.Exp.Variog <- autofitVariogram(formula = MAT.C ~ 1,
                        input_data = DMI.DK.Shp_proj2[!is.na(DMI.DK.Shp_proj2$MAT.C), ],
                        model = c("Sph", "Exp", "Gau", "Nug", "Lin"))

# View fitted model
MAT.Exp.Variog
## $exp_var
##      np        dist        gamma dir.hor dir.ver   id
## 1    53    739.3034 0.0006603774       0       0 var1
## 2     8   5760.3361 0.0062500000       0       0 var1
## 3    19  10282.7241 0.0144736842       0       0 var1
## 4    63  14933.1670 0.0236507937       0       0 var1
## 5   142  21371.2527 0.0228521127       0       0 var1
## 6   179  26821.9844 0.0210335196       0       0 var1
## 7   760  40485.4046 0.0286578947       0       0 var1
## 8   958  60047.1998 0.0430219207       0       0 var1
## 9  1832  85197.6066 0.0665447598       0       0 var1
## 10 1846 114625.0535 0.0848456121       0       0 var1
## 11 1718 144017.3889 0.0842287544       0       0 var1
## 12 2330 178465.6679 0.0842296137       0       0 var1
## 
## $var_model
##   model        psill    range
## 1   Nug 0.0000708596      0.0
## 2   Sph 0.0867150862 154583.3
## 
## $sserr
## [1] NA
## 
## attr(,"class")
## [1] "autofitVariogram" "list"
# Plot sample variogram with fitted model
plot(MAT.Variog, 
     MAT.Exp.Variog[[2]],
     main = "Semivariogram with Fitted Model")

TASK: Fill in autofitVariogram function and its arguments (formula, input_data, model), and objects to plot.

7.5 Perform Kriging

Method 1: Using gstat and predict

# Recreate spatial grid
DK.SpaGrid.1km <- raster(wrld_simpl_denmark_proj)
res(DK.SpaGrid.1km) <- 1000
DK.SpaGrid.1km[] <- 1
DK.SpaGrid.1km <- mask(DK.SpaGrid.1km, wrld_simpl_denmark_proj)
DK.SpaGrid.1km <- as(DK.SpaGrid.1km, 'SpatialGridDataFrame')

# Create gstat object with variogram
gstat.MAT <- gstat(formula = MAT.C ~ 1,
                   locations = DMI.DK.Shp_proj2[!is.na(DMI.DK.Shp_proj2$MAT.C), ],
                   model = MAT.Exp.Variog[[2]])

# Predict
Pred.gstat.MAT <- predict(object = gstat.MAT,
                        newdata = DK.SpaGrid.1km)
## [using ordinary kriging]
# Convert to raster
Krig.MAT <- rast(Pred.gstat.MAT)

# Plot
ggplot() +
  geom_spatraster(data = Krig.MAT[[1]]) +
  scale_fill_viridis_c(name = "Temperature (°C)", option = "B") +
  geom_sf(data = st_as_sf(wrld_simpl_denmark_proj), 
          fill = NA, color = "black") +
  theme_minimal() +
  labs(title = "Ordinary Kriging - Mean Annual Temperature")

TASK: Fill in gstat function and arguments (formula, locations, model), predict function and arguments, and rast conversion.

Method 2: Using krige function

# Direct kriging
Pred.krige.MAT <- krige(formula = MAT.C ~ 1,
                        locations = DMI.DK.Shp_proj2[!is.na(DMI.DK.Shp_proj2$MAT.C), ],
                        newdata = DK.SpaGrid.1km,
                        model = MAT.Exp.Variog[[2]])
## [using ordinary kriging]
# Convert and plot
Krig.MAT <- rast(Pred.krige.MAT)

ggplot() +
  geom_spatraster(data = Krig.MAT[[1]]) +
  scale_fill_viridis_c(name = "Temperature (°C)", option = "B") +
  geom_sf(data = st_as_sf(wrld_simpl_denmark_proj), 
          fill = NA, color = "black") +
  theme_minimal() +
  labs(title = "Kriging - Mean Annual Temperature")

TASK: Fill in krige function and its arguments (formula, locations, newdata, model).

Method 3: Auto Kriging

# Automatic kriging with automap
Pred.autoKrige.MAT <- autoKrige(formula = MAT.C ~ 1,
                            input_data = DMI.DK.Shp_proj2[!is.na(DMI.DK.Shp_proj2$MAT.C), ],
                            new_data = DK.SpaGrid.1km)
## [using ordinary kriging]
# Convert and plot
Krig.MAT <- rast(Pred.autoKrige.MAT[[1]])

ggplot() +
  geom_spatraster(data = Krig.MAT[[1]]) +
  scale_fill_viridis_c(name = "Temperature (°C)", option = "B") +
  geom_sf(data = st_as_sf(wrld_simpl_denmark_proj), 
          fill = NA, color = "black") +
  theme_minimal() +
  labs(title = "Auto Kriging - Mean Annual Temperature")

TASK: Fill in autoKrige function and its arguments (formula, input_data, new_data).

7.6 Cross-Validation for Kriging

# Kriging cross-validation
Cros.Val.MAT <- krige.cv(formula = MAT.C ~ 1,
                      locations = DMI.DK.Shp_proj2[!is.na(DMI.DK.Shp_proj2$MAT.C), ],
                      nfold = 5,
                      model = MAT.Exp.Variog[[2]])

# Calculate RMSE for each fold
RMSE.Kig.MAT <- sapply(1:5, function(i) {
  RMSE.Fnc(
    observed = DMI.DK.Shp_proj2[!is.na(DMI.DK.Shp_proj2$MAT.C), ][Cros.Val.MAT$fold == i, ]$MAT.C,
    predicted = Cros.Val.MAT$var1.pred[Cros.Val.MAT$fold == i]
  )
})

# Print results
print(RMSE.Kig.MAT)
## [1] 0.10652107 0.13674795 0.08725594 0.09190339 0.05868823
print(paste("Mean RMSE:", round(mean(RMSE.Kig.MAT), 3)))
## [1] "Mean RMSE: 0.096"
# Calculate improvement
RMSE.Kig.MAT.Imp <- 1 - (mean(RMSE.Kig.MAT) / null.RMSE.MAT)
print(paste("Improvement over null:", round(RMSE.Kig.MAT.Imp * 100, 1), "%"))
## [1] "Improvement over null: 64.1 %"

TASK: Fill in krige.cv function and its arguments (formula, locations, nfold, model), and the prediction column name.


Part 8: Model Comparison

8.1 Compare Performance

# Create comparison table
RMSE.sum <- data.frame(
  Model = c("Null", "IDW", "TPS", "Kriging"),
  Mean_RMSE = c(null.RMSE.MAT, 
                mean(RMSE.IDW.MAT),
                mean(RMSE.TPS.MAT),
                mean(RMSE.Kig.MAT)),
  Improvement = c(0,
                 RMSE.IDW.MAT.Imp * 100,
                 RMSE.TPS.MAT.Imp * 100,
                 RMSE.Kig.MAT.Imp * 100)
)

# Print table
print(RMSE.sum)
##     Model  Mean_RMSE Improvement
## 1    Null 0.26807916     0.00000
## 2     IDW 0.12788298    52.29656
## 3     TPS 0.11059407    58.74574
## 4 Kriging 0.09622331    64.10638
# Visualize comparison
ggplot(RMSE.sum[-1, ], aes(x = Model, y = Improvement)) +
  geom_bar(stat = "identity", fill = "steelblue") +
  theme_minimal() +
  labs(title = "Model Performance Comparison",
       y = "Improvement over Null Model (%)",
       x = "Interpolation Method")

TASK: Fill in null RMSE, RMSE vectors for each method, improvement values, and geom for bars.

8.2 Determine Best Model

Question: Which interpolation method performed best? Why do you think this is the case?

Answer space: - Best method: Kriging - Reason: Because it improved the precision of predictions (lowered RMSE) the best compared to the null model (where prediction = mean).


Part 9: Ensemble Prediction

9.1 Create Weighted Ensemble

# Create weights based on RMSE (inverse weighting)
RMSE.w <- data.frame(
  Model = c("IDW", "TPS", "Kriging"),
  MAT.RMSE = c(mean(RMSE.IDW.MAT),
               mean(RMSE.TPS.MAT),
               mean(RMSE.Kig.MAT))
)

# Calculate weights (inverse proportion)
RMSE.w$w <- (1 / RMSE.w$MAT.RMSE) / sum(1 / RMSE.w$MAT.RMSE)

print(RMSE.w)
##     Model   MAT.RMSE         w
## 1     IDW 0.12788298 0.2869152
## 2     TPS 0.11059407 0.3317680
## 3 Kriging 0.09622331 0.3813168

TASK: Fill in mean function for each RMSE vector and sum function.

9.2 Create Raster Stack

# Stack all predictions
MAT.Stack <- c(IDW.MAT.1km[[1]],
               rast(TPS.MAT.Pred),
               project(Krig.MAT, IDW.MAT.1km)[[1]])

# Name layers
names(MAT.Stack) <- c("IDW", "TPS", "Kriging")

print(MAT.Stack)
## class       : SpatRaster 
## size        : 397, 446, 3  (nrow, ncol, nlyr)
## resolution  : 1000.927, 1000.53  (x, y)
## extent      : 4200694, 4647107, 3491165, 3888375  (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=laea +lat_0=52 +lon_0=10 +x_0=4321000 +y_0=3210000 +ellps=GRS80 +units=m +no_defs 
## source(s)   : memory
## names       :  IDW,      TPS,  Kriging 
## min values  : 8.01, 7.897239, 7.904867 
## max values  : 8.94, 9.325737, 9.089675

TASK: Fill in function to convert to raster and function to set names.

9.3 Calculate Ensemble Prediction

# Weighted ensemble
MAT.Ensembl <- sum(MAT.Stack * RMSE.w$w)

# Plot ensemble
ggplot() +
  geom_spatraster(data = MAT.Ensembl) +
  scale_fill_viridis_c(name = "Temperature (°C)", option = "B") +
  geom_sf(data = st_as_sf(wrld_simpl_denmark_proj), 
          fill = NA, color = "black") +
  theme_minimal() +
  labs(title = "Ensemble Prediction - Mean Annual Temperature")

TASK: Fill in function to sum rasters and the stack object.

9.4 Compare All Methods

# Add ensemble to stack
MAT.Stack <- c(MAT.Stack, MAT.Ensembl)
names(MAT.Stack)[4] <- "Ensemble"

# Plot all methods
# Plot all methods
ggplot() +
  geom_spatraster(data = MAT.Stack) +
  facet_wrap(~lyr) +
  scale_fill_viridis_c(name = "Temperature (°C)", option = "B") +
  geom_sf(data = st_as_sf(wrld_simpl_denmark_proj), 
          fill = NA, color = "black") +
  theme_minimal() +
  labs(title = "All Predictions - Mean Annual Temperature")


Summary

Key Concepts Covered

Point data preparation - Loading and visualizing station data
Coordinate transformation - Converting to projected CRS
Null model - Establishing baseline performance
IDW - Distance-based weighting interpolation
TPS - Smooth surface fitting
Kriging - Geostatistical interpolation using semivariograms
Cross-validation - Assessing model performance
Model comparison - Selecting optimal method
Ensemble modeling - Combining multiple methods

Interpolation Methods Comparison

Method Pros Cons Best For
IDW Simple, fast Doesn’t smooth, edge effects Quick estimates
TPS Smooth surfaces Can oversmooth Gradual transitions
Kriging Optimal weights, uncertainty Complex, needs variogram High accuracy needed
Ensemble Robust, balanced Complex Critical applications

When to Use Each Method

Use IDW when: - Need quick results - Data is relatively uniform - Computational resources limited

Use TPS when: - Want smooth surfaces - Gradual spatial patterns - No strong directional trends

Use Kriging when: - Need highest accuracy - Have sufficient data points - Want uncertainty estimates - Spatial autocorrelation is important

Use Ensemble when: - Maximum reliability needed - Multiple methods perform similarly - Want to hedge against method-specific biases


Challenge Exercises

Challenge 1: Different Variable

Repeat the analysis using Annual Precipitation instead of Mean Annual Temperature. Which method works best? Is it different from temperature?

Challenge 2: Resolution Effects

Try different grid resolutions (500m, 2000m, 5000m). How does resolution affect: - Computational time? - Visual appearance? - Cross-validation RMSE?

Challenge 3: Variogram Models

In the Kriging section, try fitting different variogram models manually: - Spherical - Exponential
- Gaussian

Which provides the best fit?

Challenge 4: Prediction Uncertainty

Extract and map the prediction variance from your Kriging model. Where is uncertainty highest?


Additional Resources